38 research outputs found
LEAGUE: Guided Skill Learning and Abstraction for Long-Horizon Manipulation
To assist with everyday human activities, robots must solve complex
long-horizon tasks and generalize to new settings. Recent deep reinforcement
learning (RL) methods show promise in fully autonomous learning, but they
struggle to reach long-term goals in large environments. On the other hand,
Task and Motion Planning (TAMP) approaches excel at solving and generalizing
across long-horizon tasks, thanks to their powerful state and action
abstractions. But they assume predefined skill sets, which limits their
real-world applications. In this work, we combine the benefits of these two
paradigms and propose an integrated task planning and skill learning framework
named LEAGUE (Learning and Abstraction with Guidance). LEAGUE leverages the
symbolic interface of a task planner to guide RL-based skill learning and
creates abstract state space to enable skill reuse. More importantly, LEAGUE
learns manipulation skills in-situ of the task planning system, continuously
growing its capability and the set of tasks that it can solve. We evaluate
LEAGUE on four challenging simulated task domains and show that LEAGUE
outperforms baselines by large margins. We also show that the learned skills
can be reused to accelerate learning in new tasks domains and transfer to a
physical robot platform.Comment: Accepted to RA-L 202
Scene Graph Generation by Iterative Message Passing
Understanding a visual scene goes beyond recognizing individual objects in
isolation. Relationships between objects also constitute rich semantic
information about the scene. In this work, we explicitly model the objects and
their relationships using scene graphs, a visually-grounded graphical structure
of an image. We propose a novel end-to-end model that generates such structured
scene representation from an input image. The model solves the scene graph
inference problem using standard RNNs and learns to iteratively improves its
predictions via message passing. Our joint inference model can take advantage
of contextual cues to make better predictions on objects and their
relationships. The experiments show that our model significantly outperforms
previous methods for generating scene graphs using Visual Genome dataset and
inferring support relations with NYU Depth v2 dataset.Comment: CVPR 201
Constrained-Context Conditional Diffusion Models for Imitation Learning
Offline Imitation Learning (IL) is a powerful paradigm to learn visuomotor
skills, especially for high-precision manipulation tasks. However, IL methods
are prone to spurious correlation - expressive models may focus on distractors
that are irrelevant to action prediction - and are thus fragile in real-world
deployment. Prior methods have addressed this challenge by exploring different
model architectures and action representations. However, none were able to
balance between sample efficiency, robustness against distractors, and solving
high-precision manipulation tasks with complex action space. To this end, we
present onstrained-ontext onditional
iffusion odel (C3DM), a diffusion model policy for
solving 6-DoF robotic manipulation tasks with high precision and ability to
ignore distractions. A key component of C3DM is a fixation step that helps the
action denoiser to focus on task-relevant regions around the predicted action
while ignoring distractors in the context. We empirically show that C3DM is
able to consistently achieve high success rate on a wide array of tasks,
ranging from table top manipulation to industrial kitting, that require varying
levels of precision and robustness to distractors. For details, please visit
this https://sites.google.com/view/c3dm-imitation-learnin
NOD-TAMP: Multi-Step Manipulation Planning with Neural Object Descriptors
Developing intelligent robots for complex manipulation tasks in household and
factory settings remains challenging due to long-horizon tasks, contact-rich
manipulation, and the need to generalize across a wide variety of object shapes
and scene layouts. While Task and Motion Planning (TAMP) offers a promising
solution, its assumptions such as kinodynamic models limit applicability in
novel contexts. Neural object descriptors (NODs) have shown promise in object
and scene generalization but face limitations in addressing broader tasks. Our
proposed TAMP-based framework, NOD-TAMP, extracts short manipulation
trajectories from a handful of human demonstrations, adapts these trajectories
using NOD features, and composes them to solve broad long-horizon tasks.
Validated in a simulation environment, NOD-TAMP effectively tackles varied
challenges and outperforms existing methods, establishing a cohesive framework
for manipulation planning. For videos and other supplemental material, see the
project website: https://sites.google.com/view/nod-tamp/
Zero-Shot Object Searching Using Large-scale Object Relationship Prior
Home-assistant robots have been a long-standing research topic, and one of
the biggest challenges is searching for required objects in housing
environments. Previous object-goal navigation requires the robot to search for
a target object category in an unexplored environment, which may not be
suitable for home-assistant robots that typically have some level of semantic
knowledge of the environment, such as the location of static furniture. In our
approach, we leverage this knowledge and the fact that a target object may be
located close to its related objects for efficient navigation. To achieve this,
we train a graph neural network using the Visual Genome dataset to learn the
object co-occurrence relationships and formulate the searching process as
iteratively predicting the possible areas where the target object may be
located. This approach is entirely zero-shot, meaning it doesn't require new
accurate object correlation in the test environment. We empirically show that
our method outperforms prior correlational object search algorithms. As our
ultimate goal is to build fully autonomous assistant robots for everyday use,
we further integrate the task planner for parsing natural language and
generating task-completing plans with object navigation to execute human
instructions. We demonstrate the effectiveness of our proposed pipeline in both
the AI2-THOR simulator and a Stretch robot in a real-world environment
Neural Task Programming: Learning to Generalize Across Hierarchical Tasks
In this work, we propose a novel robot learning framework called Neural Task
Programming (NTP), which bridges the idea of few-shot learning from
demonstration and neural program induction. NTP takes as input a task
specification (e.g., video demonstration of a task) and recursively decomposes
it into finer sub-task specifications. These specifications are fed to a
hierarchical neural program, where bottom-level programs are callable
subroutines that interact with the environment. We validate our method in three
robot manipulation tasks. NTP achieves strong generalization across sequential
tasks that exhibit hierarchal and compositional structures. The experimental
results show that NTP learns to generalize well to- wards unseen tasks with
increasing lengths, variable topologies, and changing objectives.Comment: ICRA 201
Evolutionary Curriculum Training for DRL-Based Navigation Systems
In recent years, Deep Reinforcement Learning (DRL) has emerged as a promising
method for robot collision avoidance. However, such DRL models often come with
limitations, such as adapting effectively to structured environments containing
various pedestrians. In order to solve this difficulty, previous research has
attempted a few approaches, including training an end-to-end solution by
integrating a waypoint planner with DRL and developing a multimodal solution to
mitigate the drawbacks of the DRL model. However, these approaches have
encountered several issues, including slow training times, scalability
challenges, and poor coordination among different models. To address these
challenges, this paper introduces a novel approach called evolutionary
curriculum training to tackle these challenges. The primary goal of
evolutionary curriculum training is to evaluate the collision avoidance model's
competency in various scenarios and create curricula to enhance its
insufficient skills. The paper introduces an innovative evaluation technique to
assess the DRL model's performance in navigating structured maps and avoiding
dynamic obstacles. Additionally, an evolutionary training environment generates
all the curriculum to improve the DRL model's inadequate skills tested in the
previous evaluation. We benchmark the performance of our model across five
structured environments to validate the hypothesis that this evolutionary
training environment leads to a higher success rate and a lower average number
of collisions. Further details and results at our project website.Comment: Robotics: Science and System
Human-in-the-Loop Task and Motion Planning for Imitation Learning
Imitation learning from human demonstrations can teach robots complex
manipulation skills, but is time-consuming and labor intensive. In contrast,
Task and Motion Planning (TAMP) systems are automated and excel at solving
long-horizon tasks, but they are difficult to apply to contact-rich tasks. In
this paper, we present Human-in-the-Loop Task and Motion Planning (HITL-TAMP),
a novel system that leverages the benefits of both approaches. The system
employs a TAMP-gated control mechanism, which selectively gives and takes
control to and from a human teleoperator. This enables the human teleoperator
to manage a fleet of robots, maximizing data collection efficiency. The
collected human data is then combined with an imitation learning framework to
train a TAMP-gated policy, leading to superior performance compared to training
on full task demonstrations. We compared HITL-TAMP to a conventional
teleoperation system -- users gathered more than 3x the number of demos given
the same time budget. Furthermore, proficient agents (75\%+ success) could be
trained from just 10 minutes of non-expert teleoperation data. Finally, we
collected 2.1K demos with HITL-TAMP across 12 contact-rich, long-horizon tasks
and show that the system often produces near-perfect agents. Videos and
additional results at https://hitltamp.github.io .Comment: Conference on Robot Learning (CoRL) 202